The rhizome of Turmeric is a major agricultural commodity in Erode, contributing significantly to trade and production, as the region accounts for nearly 60 percent of India’s turmeric output and is widely known as the “Turmeric Capital of the World”.
However, while such importance is well acknowledged, farmers in Erode have not yet to switch from traditional guess work and subjective evaluation of future crop yield. The purpose of this preliminary study is to determine whether there is a way to predict turmeric yield through analysis of data that is easily obtainable and available to framers and agricultural experts working in Erode district. For this purpose, a limited yet comprehensive database was compiled containing values for eight major factors influencing turmeric yield: soil pH, soil nitrogen, phosphorus and potassium levels, total seasonal rainfall, average temperature, relative humidity and the final crop yield measured in tons per hectare. Subsequently, two machine learning models were trained and compared: Linear regression, selected as a baseline model for its simplicity and interpretability and random forest, chosen for its robustness and higher predictive capability. The results indicated that random forest significantly outperformed linear regression, achieving lower RMSE and MAE along with a higher R2 score on test dataset. Among all input variables, rainfall and potassium availability were identified as the most influential factors affecting turmeric yield.
Introduction
The text presents a study on turmeric yield prediction in Erode district, India, a major turmeric-producing region that contributes significantly to India’s spice economy. Despite its importance, yield estimation is still largely based on traditional methods like farmer experience and crop-cutting surveys, which are often inaccurate. This creates problems in planning, pricing, and supply chain management.
To address this, the study explores machine learning-based yield prediction using locally collected agricultural and environmental data. It uses two simple models: Linear Regression (for interpretability) and Random Forest (for better predictive performance). The goal is to show that reliable yield prediction can be achieved using accessible data without advanced technologies like satellite imaging.
The literature review highlights that machine learning models, especially ensemble methods like Random Forest and XGBoost, generally outperform traditional statistical methods in crop yield prediction. However, Linear Regression remains useful as a baseline due to its simplicity and interpretability. Key factors affecting crop yield include soil nutrients (N, P, K), rainfall, temperature, humidity, and soil pH.
The study focuses on Erode’s unique agro-climatic conditions, including red loamy soils, bimodal monsoon rainfall, and the high-curcumin turmeric variety “Erode local.” A dataset of 80 plot-season observations (2015–2024) was created using soil health cards, IMD weather data, farmer surveys, and government agricultural records.
Conclusion
This study addresses a practical question: whether turmeric yield in Erode can be predicated using a small, locally collected dataset combined with basic machine learning models. The findings provide a qualified but meaningful affirmative answer. The random forest model, train on 56 training observations and eight input feature, achieved strong predictive performance with an R2of 0.94 and an RMSE of 0.34t/ha. These results indicate that even with limited data reliable yield forecasting is achievable. The linear regression model, while less accurate (R2=0.81),still serves as a useful and interpretable baseline that can be applied in low-resource or extension-level contexts.
Rainfall and potassium availability emerged as the most influential predictors of yield, aligning strongly with established agronomic understanding and reinforcing their central role in turmeric production system. These findings have direct implications for irrigation scheduling and fertilizer management in the Erode region. This study demonstrates that effective yield predictions do not necessarily require advanced sensing infrastructure or large-scale datasets. With careful data collection, appropriate pre-processing and suitable modeling techniques, practical forecasting system can be developed even in data-limited agriculture environments. This work is intended as an initial step toward more advanced, scalable and farmer-centric decision-supported tools.
References
[1] Directorate of Economics and Statistics, Tamil Nadu. (2024). Season and Crop Report 2023–2024. Government of Tamil Nadu, Chennai.
[2] Ministry of Agriculture and Farmers Welfare. (2022). Crop Cutting Experiments: Methodology and Limitations. Government of India, New Delhi.
[3] Liakos, K. G., Busato, P., Moshou, D., Pearson, S., & Bochtis D. (2018). Machine learning in agriculture: A review. Sensors,18(8), 2674.
[4] Shook J., Gangopadhyay T., Wu, L., Ganapathy Subramanian B., Sarkar S., & Singh A. K. (2021). Crop yield prediction integrating genotype and weather variables using deep learning PLoS ONE,16(6), e0252402.
[5] Jeong, J. H., Resop, J. P., Mueller, N. D., Fleisher, D. H., Yun, K., Butler, E. E., & Kim, S. H. (2022). Random forests for global and regional crops yield predictions. PLoS ONE, 11(6), e0156571.
[6] Gandhi, N., & Armstrong, L. J. (2020). Applying data mining techniques to predict crop yield in Indian agriculture in proceedings of the IEEE International Conference on Advances in Computer Applications (ICACA) (pp. 95–100).
[7] Cao, J., Zhang, Z., Tao, F., Zhang, L., Luo, Y., Han, J., & Li, Z. (2021). Identifying contributions of multi-source data for winter wheat yield prediction in China. Remote Sensing,12(5), 750.
[8] [8] Sehgal, V. K., Bhatt, D., & Kumar, S. (2021). Comparative analysis of machine learning methods for crop yield prediction in semi-arid regions of India. Journal of the Indian Society of Remote Sensing, 49(3), 617–629.
[9] Lobell, D. B., & Burke, M. B. (2010). On the use of statistical models to predict crop yield responses to climate change. Agricultural and Forest Meteorology, 150(11), 1443–1452.
[10] Verma, A., Sharma, R., & Patidar, N. (2022). Application of Random Forest for turmeric yield prediction: A study from Sangli region. Indian Journal of Agricultural Sciences, 92(4), 487–492.
[11] Padmavathi, K & Prabavathi, M. (2021). Machine learning models for predicting spice crop yield: A review. International Journal of Intelligent Systems and Applications, 13(1), 43–52.
[12] Soil Health Card Scheme. (2024). SHC Portal – Soil Test Data for Erode District. Department of Agriculture and Farmers Welfare, Government of India. soilhealth.dac.gov.in
[13] Department of Economics and Statistics, Tamil Nadu. (2023). District-wise Crop Production Statistics, Erode. Government of Tamil Nadu.
[14] Zhang, S. (2012). Nearest neighbor selection for iteratively KNN imputation. Journal of Systems and Software, 85(11), 2541–2552.
[15] Montgomery, D. C., Peck, E. A., & Vining, G. G. (2021). Introduction to Linear Regression Analysis (6th ed.). Wiley.
[16] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
[17] TNAU. (2023). Package of Practices for Spice Crops: Turmeric. Tamil Nadu Agricultural University, Coimbatore
[18] Sentinel Hub. (2024). Sentinel-1 SAR and Sentinel-2 NDVI for Agricultural Monitoring. Copernicus Open Access Hub.
[19] Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (NeurIPS), 30, 4765–4774.